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Abstract: In this paper we consider the problem of separating noisy instantaneous linear mixtures of document images in 
the Bayesian framework. The source image is modeled hierarchically by a latent labeling process representing 
the common classifications of document objects among different color channels and the intensity process of 
pixels given the class labels. A Potts Markov random field is used to model regional regularity of the classi- 
fication labels inside object regions. Local dependency between neighboring pixels can also be accounted by 
smoothness constraint on their intensities. Within the Bayesian approach, all unknowns including the source, 
the classification, the mixing coefficients and the distribution parameters of these variables are estimated from 
their posterior laws. The corresponding Bayesian computations are done by MCMC sampling algorithm. Re- 
sults from experiments on synthetic and real image mixtures are presented to illustrate the performance of the 
proposed method. 



1 INTRODUCTION 

Blind source separation (BSS) is an active research 
topic of signal and image processing in recent years. 
It considers separating a set of unknown signals from 
their observed mixtures, with reasonable assumptions 
of the form of the mixing process: linear or nonlin- 
ear, instantaneous or convoluting, under or over de- 
termined, noisy or noiseless, and so on. However, in 
all cases the mixing coefficients remain unknown and 
have to be estimated as well as original source signals. 

Various methods and models have been pro- 
posed for BSS task, among which Principal Compo- 
nent Analysis (PCA) seeks orthogonal directions of 
maximum variance exhibited by the data as source 
axes, while Independent Component Analysis (ICA) 
( Hyvarinen et al., 200 \) , in its basic form, assumes 
statistical independency of sources and linear mix- 
ing process and consists of seeking an inverse linear 
transformation matrix applying on the data to achieve 
maximum mutual independency between output com- 
ponents. Both methods exploits basic statistical char- 
acteristics of source signals to achieve the separa- 
tion, which makes them well generalizable and robust 



in cases that as few prerequisite assumptions as un- 
correlatedness or independency can be made about 
the source. Some variant algorithms are also pro- 
posed to adapt to certain relaxation of model assump- 
tions like nonlinearity or noises ( Harmeling, 2003| 
Almeida, 2005). However, in many other cases, we 
may find the availability or the needs of various types 
of prior information to regulate the essentially ill- 
posed BSS problem. Compared with PCA and ICA, 
Bayesian framework allows convenient introduction 
of these prior constraints about the sources and the 
mixing coefficients, and more important, supports 
flexible structuring and integrating multiple hierarchi- 
cal clues for separation purpose. 

In the field of image processing, BSS ap- 
proaches are being widely employed to sepa- 
rate or segment mixed images observed from, 
for example, satelite and hyper-spectral imag- 
ing (Snoussi and Mohammad-Djafari, 2004 



|Parra et al., 2000| |Macias-Macias et al., 2*003) 
medical imaging (Calhoun and Adali, 2006; 
ISnoussi and Calhoun, 2005 1, and other superim- 
positions of natural images dBronstein et al., 20"05l 
Castella and Pesquet, 2004]). 



This paper focuses on one specific type of im- 
ages - document images, where superimposition of 
two images usually appears as a major type of degra- 
dation encountered in digitization (Sharma , 2001] > or 
ancient documents flDrira, 20 06 ). The former degra- 
dation usually occurs as artifact during scanning a 
double-sided document when the text on the back- 
side printing shows through the non-opaque medium 
and are mixed with the foreside text. FigOJ shows 
one such example. The latter cause of text superim- 
position, usually called bleed-through, can often be 
observed in old documentations due to ink blurring 
or penetrating as illustrated by FigJTJi. Other forms 
of overlapped patterns, like underwriting and water- 
marks, are also common. Though the actual under- 
lying mixing process may be quite complicated and 
diverse in various mixture forms, the linear mixing 
model usually serves as a resonable approximation 
and benefits analytical and computational simplicity, 
thus is adopted in most document separation cases. 

To separate document image mixtures, the com- 
mon PCA and ICA algorithms can be used and have 
shown their effectiveness in detecting independent 
document features like watermarks, as inspected in 
( |Tonazzini et al., 2004| l where each source was con- 
sidered as random signal sequence in a whole without 
further internal structuring. The Bayesian framework 
has also been used before for document separation as 
in ( |TonazzinTe t al., 2006), where the source is mod- 
eled by a Markov Random Field on the pixel values 
to account for local smoothness inside one object, as 
well as an extra line process enforcing the discontinu- 
ity at object edges. 

In this contribution, we propose a solution to 
jointly separate and segment linearly mixed document 
images. Besides considering the mixture in single 
grayscale channel, we address the joint separation of 
multi-channel mixture of multiple sources. In section 
2, we give the probability formulation of the problem. 
In section 3, the algorithm of Bayesian estimation for 
model parameters is described. In section 4, simula- 
tion results of the proposed algorithm are shown on 
both synthetic and real images. 
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Figure 1: Examples of mixed document images: a) show- 
through mixture; b) bleed-through mixture. 



2 MODEL ASSUMPTION AND 
FORMULATION 

Document images are created by various digitization 
methods from vast types of documentation. Com- 
monly, a color scanner can be used to produce three 
different views of one document in the red, green, 
and blue channels. With detectors working in non- 
visible wavelengths such as infrared and ultraviolet, 
even more information channels of data can be ob- 
tained, depending on the object of interest in docu- 
ments. 

Given observations of M different mixtures, either 
in grayscale or multiple channels, our work is thus 
to obtain N corresponding source images (normally 
M > N) in the same pixel format as the observations. 

2.1 Data Model 

In this work, the observations are M registered im- 
ages (Xj)(=i...Af, which are defined on the same set 
of pixels 3^: Xj — {jc,-(r)} r65l . The observations are 
noisy linear instantaneous mixture of N source im- 
ages {Sj)j=i...N also defined on following the data 
generation model given by: 

x(r) = As(r) +n(r) re^ (1) 

where A = (a//)MxAr is the unknown mixing matrix, 
n(r) is a set of independent zero-mean white Gaus- 
sian noise for each observation with variance <7g = 
(agj.-.O^), x(r) and s(r) are the observation and 
source vector at pixel r respectively. Let S = {s(r),r G 
^.}, X = {x(r),r € KS\, and denote the noise covari- 
ance matrix by R e = diagla^ . . -ol M ], we have the 
Gaussian distribution for the observations given the 
sources and the mixing parameters: 

p(X|S,A,R £ ) = n^(As(r),R E ) (2) 

r 

2.2 Source Model 

We model the distribution of pixel intensity for each 
source images (and for each color channel) by a Mix- 
ture of Gaussians (MoG), whose components cor- 
respond to each object type (or class) that appears 
roughly equal pixel values. For example, the simplest 
model may consist of two components, one for fore- 
ground text and the other for background blank. Fur- 
thermore, to allow imposing constraints on distribu- 
tion of class labels, for every source Sj we represent 
the class labels by a set of discrete hidden variables 
Zj = {zj(r),r £ with z;(r) taking values from 
{1, . . . ,Kj}, where Kj is the total number of classes 



in image Sj. In the following, we assume all {Kj} 
equal to the same value K. 

Given pixel labels, pixels of different classes can 
be reasonably assumed independent, while concern- 
ing the pixels inside a given class, there are usually 
two choices: 

1 . We may assume pixel intensities are conditionally 
independent given their labels; 

2. Alternatively, we may explicitly take into account 
the local dependency between neighboring pixels 
of same class. 



/* source is modeled by: 



In the first choice, the distribution of pixel r in the 



p(sj(r)\ Z j{r) = k) = 9i(ji jk ,a 2 jk ) 



(3) 



where jj^ and aj k are the mean and variance of the 

k? h Gaussian component of the /'' source. Assuming 
independency between different sources and denoting 
the set of labels corresponding to every source by Z = 
{Zj, j — 1 . . .N}, we have: 

P(S\Z)=UU II P(s j (r)\zj(r)=k) 

j k {r.Zj{r)=k) 

which is by (0 also a Gaussian and spatially separable 
on r. 

In the second choice, the local dependency can be 
accounted by extra smoothness constraints, like the 
mean value, between neighboring pixels. We first as- 
sign a binary valued contour flag qj(r) for every pixel 
r of every source j, which is deterministicly computed 
by: 



nir) 



1 if Zj {r')=Zj{r),W eV{r) 
else 



where 1? (r) denotes the neighbor sites of the site r. 

Then, based on the value of the contour flag and 
possibly current values of the neighboring pixels, the 
distribution of intensity of individual pixel is formu- 
lated as: 

p (sj (r)\z J (r)=k 1 s J (r / ),r>eV(r)) = 9t (sj (r),G 2 (r)) 

(4) 

with, 



Sj{r)=qj{r)pj k + {l-qj{r)) 



1 



a 2 J (r) = qj (r)a% + (l-q j ( r ))a 2 



where l^jk(r) denotes the intersection of V ' (r) with 
the site set Sfyj = {r : Zj(r) = k}, a 2 is the a prior 
variance of pixel values inside a region. Eqn.© 
states that at the contour pixel intensities follow 



the Gauss distribution whose parameters are deter- 
mined by the class labels as <f3j>, while inside a re- 
gion the distribution parameters are computed from 
the neighboring pixels. Note that under this as- 
sumption, p(S|Z) is no longer separable on r, but 
with the parallel Gibbs sampling scheme proposed in 



(Feron and Mohammad-Djafari, 2005), it can still be 
simulated efficiently. 

As a commonly observed property of visual ob- 
jects, pixels belonging to the same object usually con- 
nect to each other in a neighborhood of space, form- 
ing several connected regions of uniformly classified 
pixels, for instance, the multiple components consti- 
tuting a text. By class labels defined earlier, this im- 
plies regional smoothness of the spatial distribution of 
class labels. This can be naturally modeled by a prior 
Potts Markov Random Field for every label process 



p{zj{r),re>K.)<xexp p ; - £ £ 8( Zj (r) - Zj (r')) 

(5) 

The parameter p reflects the degree of smoothing in- 
teractions between pixels and controls the expected 
size of the regions. In our work, all are as- 
sumed equal and assigned an empirical value within 
[1.5,2.0]. 

2.3 Multiple Channels 

When multi-channel image data are considered, there 
are multiple options for the processing model. We can 
perform separation of sources independently in each 
channel and by some measures merge the results in 
the end. Or, we may consider joint demixing for all 
channels. In the latter case, the mixing model can still 
have more alternatives: 

a) all channels are equally mixed with the same mix- 
ing matrix; 

b) the mixing occurs separately in each channel with 
different mixing matrices; 

c) cross-channel mixing is assumed to be present. 

In the case of a), samples from different channels of 
the same observation can be concatenated for esti- 
mation of the mixing coefficients, which is similar 
to the monochrome case. In the case of c), an ex- 
panded mixing matrix Amlxnl (supposing L chan- 
nels) is used for all channels of all sources. 

In this work, we assume the model b), where 
the mixing in different channels are mutually in- 
dependent and with their own separate coeffi- 
cients. Thus, in RGB color format, the sources 



and observations are actually {S r j,Sj,S^} J= i.,. N and 3.2 Estimation by MCMC Sampling 

{X[ ,Xf ,X-'}i = i_ .M- Correspondingly, there are 



{A r ,Rg,^ it , . ..A 6 ,R*,^J and so on. But for each 
source Sj, only one classification field Zj is main- 
tained and shared by all channels, as a natural way 
to enforce the common segmentation among dif- 
ferent channels. This two-level hierarchical source 
model, which also facilitates introducing segmenta- 
tion constraints like discontinuity and local regional 
dependency, is the main difference with the work of 
( [Tonazzini et al., 20 06 ), where a one-level MRF mod- 
eling of sources is defined on the single-channel pixel 
intensities along with an explicit binary edge process. 



3 BAYESIAN ESTIMATION OF 
MODEL PARAMETERS 

The unknown variables we want to estimate in the 
models given above are {S,Z,A,8}, 9 representing all 
hyperparameters. The Bayesian estimation approach 
consists of deriving the posterior distribution of all 
the unknowns given the observation and then based 
on this distribution, employing appropriate estimators 
such as Maximum A Posteriori (MAP) or the Poste- 
rior Means (PM) for them. With our model assump- 
tions, this posterior distribution can be expressed as: 

/7(s,z,0|x)ocp(x|s,A,R £ )p(s|z,e s )p(z) P (e) 

(6) 

where, S = {{juj^aj^J — 1 . . .N,k = 1 . . .K} and 
9 = {A,R E ,e s }. 

3.1 Prior Assignments for Model 
Parameters 

According to the linear mixing model and all Gaus- 
sian assumptions, we choose corresponding conjugate 
priors for model hyperparameters. 

• Gaussian for source means 

^(^*o,a*b) 

• Inverse Gamma for source variances 

aj k ~ / g (a w ,P«)) 

• Inverse Wishart for noise covariance 



R, 



-1 



^i(«Eo,P 



In this work, we assign uniform prior to A for sim- 
plicity and no preference of the mixing coefficients, 
while in other cases prior distributions like Gamma 
may be used to enforce positivity. 



Given the joint a posteriori distribution © of all un- 
known variables, we use the Posterior Means as the 
estimation for them. Since direct integration over z 
is intractable, MCMC methods are employed in the 
actual Bayesian computations. In our work, a Gibbs 
sampling algorithm is used to generate a set of sam- 
ples for every variable to be estimated, according to 
its full-conditional a posteriori distribution given all 
other variables fixed to their current values. Then, af- 
ter certain burn-in runs, sample means from further 
iterations are used as the Posterior Means estimation 
for the unknowns. The algorithm takes the form: 

Repeat until converge, 

1. simulate S' ~/>(S|Z,9,X) 

2. simulate Z' ~/>(Z|S',8,X) 

3. simulate 6' ~p(e|Z',S',X) 

Below we give the expressions of related conditional 
probability distributions. 

• Sampling Z~p(Z|X,S, 9) °cp(X|Z,0)p(Z): 

p(X|Z,6) - Y\p(x(r)\z(r),Q) 

r 

= n^( Am z(r),A£ z(r) A'+R e ) 



where, m z(r) = ^ Ul ( r) ,. 



J Nz N (r)i 



Notice p(Z) — Y[j=i P( z j)> an d as mentioned ear- 
lier, p(zj) takes the form of Potts MRF as (0. 
An inner Gibbs sampling is then used to simulate 
zj with the likelihood p(x(r)|z(r),A,9) marginal- 
ized over all configurations of {z;/(r),/ 7^ j}. 

• Sampling S ~/>(S|X,Z,9): 

p(S|X,Z,6) - p(X|S,A,R E >(S|Z,e) 

= Yl^(m a /" s '(r)X P " S '(r)) 



R a s pos '(r) = 



111 



a post 



('0 



A'R^A- 

» a post / 



-z(r) 



Sampling R £ : 

p(R E |X,S,A)c<p(X|S,A,R £ )p(R £ ) 

Considering we assign an inverse Wishart distri- 
bution to /5(R E ), which is conjugate prior for the 
likelihood d2j, R E is a posteriori sampled from: 



RT 1 ~n'/(a e ,p. 



a E 



n),Pe 



iK|(R„-R„.R-. 1 Ry 



where, the sample statistics R xx — j^jY l r x r x ' r , 

Rxv = T~~ 11/ X;-*,- R<> = ~~~~ 11;- */!>,•• 



• Sampling A ~ p(A|X,S,R e ): 

p(A|X,S,R E )c<p(X|S,A,R E )p(A) 

Given uniform or Gaussian prior for A, the poste- 
rior distribution of A is a Gaussian: 



Vec(A) ~ 9Kji A ,K A ) 

Ha = VecQlJBtf),** = ]k\ R ™ 1 



where ® is the Kronecker product and Vec(.) rep- 
resents the column-stacking operation. 

Sampling (fijk,G%): 

With (Z, S) sampled in earlier steps and conjugate 
priors assigned, the means fijk and the variances 
o\ can be sampled from respective posteriors as 
follows: 



H jk \s h z h a jk 



.2 I Wo i 1 



and, 



OLjk 



P«>- 



2 ^rGS 



U) (Sj{r)-fijkf 

where, label region — {r : Z/(r) = k} and the 
region size nj' = \3(1 \. 




B 



(a) (b) (c) (d) 

Figure 2: Separation of synthetic image mixtures: a) origi- 
nal sources; b) image mixtures; c) demixed sources; d) clas- 
sification labels. 



Fig|2] shows the synthetic image mixtures, demixed 
sources and the label fields. 

The real image for test was scanned from a duplex 
printed paper, where show-through causes the super- 
imposition of text. The separation result is shown in 
FigEJ 
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Figure 3: Separation of real show-through image mixtures: 
a) image mixtures; b) demixed sources; c) classification la- 
bels. 



4 SIMULATION RESULTS 

For evaluating the performance of the proposed algo- 
rithm, we use both synthetic and real images in the 
test. The synthetic images were generated according 
to the model setting that each source is composed of 
pixels of two classes (text and background) and two 
source images are linearly mixed in every color chan- 
nel independently to produce two observation images. 
This was done in three steps: 

1. Two binary (Kj = \2 — 2) text image were scanned 
from real documents or created by graphic tools. 
They were used as the class labels Zy=i 2 for each 
source; 

2. With known means and variances for pixel value 
of each class, the source images were generated 
according to (fJJ; 

3. For each color channel, a random selected A2 X 2 
was used to mix the sources and finally white 
Gaussian noises R £ were added (SNR=20dB). 



For comparison, we also employed the FastICA 
algorithm (Hyvarinen, 1999) on the sample images 
with typical parameter set. The results on the show- 
through examples of Figj3] are shown in FiglU All 
three channels of the two observed mixtures were 
used as inputs simultaneously to the ICA algorithm. 
The two demixed sources can be found in two of 
six independent components (IC) outputed, while 
the other four output ICs usually contain unintended 
noise-like signals, which, along with the permutabil- 
ity property of the ICA algorithm, bring difficulties 
to reconstructing color representation of the sources. 
On the other hand, when less color channels are ex- 
ploited in demixing, we noticed that the separation 
result does not necessarily degrade or improve, owing 
to the possible presence of cross-channel correlations. 

The MCMC computation involved in the proposed 
Bayesian separation method is time-consuming. For 
the example image of 300x240 pixels in Figf3] which 
is small relative to ordinary document sizes and res- 
olutions, the typical computation time of the experi- 
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(b) other ICs containing noise-like signals 
Figure 4: Separation results by ICA. 



mental implementation can come up to hours without 
specific optimizations. However, various computing 
alternatives such as Mean Field and variational ap- 
proximation can be exploited to achieve higher effi- 
ciency. 



5 CONCLUSION 

We proposed a Bayesian approach for separating 
noisy linear mixture of document images. For source 
images, we considered a hierarchical model with 
the hidden label variable z representing the common 
classification of objects among multiple color chan- 
nels, and a Potts-Markov prior was employed for the 
class labels imposing local regularity constraints. We 
showed how Bayesian estimation of all unknowns of 
interest can be computed by MCMC sampling from 
their posterior distributions given the observation. We 
then illustrated the feasibility of the proposed algo- 
rithm on joint separation and segmentation by tests 
on sample images. 
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